269 research outputs found

    Toward a Principle-Based Translator

    Get PDF
    A principle-based computational model of natural language translation consists of two components: (1) a module which makes use of a set of principles and parameters to transform the source language into an annotated surface form that can be easily converted into a "base" syntactic structure; and (2) a module which makes use of the same set of principles, but a different set of parameter values, to transform the "base" syntactic structure into the target language surface structure. This proposed scheme of language translation is an improvement over existing schemes since it is based on interactions between principles and parameters rather than on complex interactions between language-specific rules as found in older schemes. The background for research of the problem includes: an examination of existing schemes of computerized language translation and an analysis of their shortcomings. Construction of the proposed scheme requires a preliminary investigation of the common "universal" principles and parametric variations across different languages within the framework of current linguistic theory. The work to be done includes: construction of a module which uses linguistic principles and source language parameter values to parse and output the corresponding annotated surface structures of source language sentences; creation of procedures which handle the transformation of an annotated surface structure into a "base" syntactic structure; and development of a special purpose generation scheme which converts a "base" syntactic structure into a surface form in the target language.MIT Artificial Intelligence Laborator

    LEXICALL: Lexicon Construction for Foreign Language Tutoring

    Get PDF
    We focus on the problem of building large repositories of lexical conceptual structure (LCS) representations for verbs in multiple languages. One of the main results of this work is the definition of a relation between broad semantic classes and LCS meaning components. Our acquisition program---LEXICALL---takes, as input, the result of previous work on verb classification and thematic grid tagging, and outputs LCS representations for different languages. These representations have been ported into English, Arabic and Spanish lexicons, each containing approximately 9000 verbs. We are currently using these lexicons in an operational foreign language tutoring and machine translation. (Also cross-referenced as UMIACS-TR-97-09

    Development of Cross-Linguistic Syntactic and Semantic Parameters for Parsing and Generation

    Get PDF
    This document reports on research conducted at the University of Maryland for the Korean/English Machine Translation (MT) project. The translation approach adopted here is interlingual i.e., a single underlying representation called Lexical Conceptual Structure (LCS) is used for both Korean and English. The primary focus of this investigation concerns the notion of `parameterization' i.e., a mechanism that accounts for both syntactic and lexical-semantic distinctions between Korean and English. We present our assumptions about the syntactic structure of Korean-type languages vs. English-type languages and describe our investigation of syntactic parameterization for distinguishing between these two types of languages. We also present the details of the LCS structure and describe how this representation is parameterized so that it accommodates both languages. We address critical issues concerning interlingual machine translation such as locative postpositions and the dividing line between the interlingua and the knowledge representation. Difficulties in translation and transliteration of Korean are discussed and complex morphological properties of Korean are presented. Finally, we describe recent work on lexical acquisition and conclude with a discussion about two hypotheses concerning semantic classification that are currently being tested. (Also cross-referenced as UMIACS-TR-94-26

    UNITRAN: A Principle-Based Approach to Machine Translation

    Get PDF
    Machine translation has been a particularly difficult problem in the area of Natural Language Processing for over two decades. Early approaches to translation failed since interaction effects of complex phenomena in part made translation appear to be unmanageable. Later approaches to the problem have succeeded (although only bilingually), but are based on many language-specific rules of a context-free nature. This report presents an alternative approach to natural language translation that relies on principle-based descriptions of grammar rather than rule-oriented descriptions. The model that has been constructed is based on abstract principles as developed by Chomsky (1981) and several other researchers working within the "Government and Binding" (GB) framework. Thus, the grammar is viewed as a modular system of principles rather than a large set of ad hoc language-specific rules

    Knowledge Graphs Effectiveness in Neural Machine Translation Improvement

    Get PDF
    Neural Machine Translation (NMT) systems require a massive amount of Maintaining semantic relations between words during the translation process yields more accurate target-language output from Neural Machine Translation (NMT). Although difficult to achieve from training data alone, it is possible to leverage Knowledge Graphs (KGs) to retain source-language semantic relations in the corresponding target-language translation. The core idea is to use KG entity relations as embedding constraints to improve the mapping from source to target. This paper describes two embedding constraints, both of which employ Entity Linking (EL)---assigning a unique identity to entities---to associate words in training sentences with those in the KG: (1) a monolingual embedding constraint that supports an enhanced semantic representation of the source words through access to relations between entities in a KG; and (2) a bilingual embedding constraint that forces entity relations in the source-language to be carried over to the corresponding entities in the target-language translation. The method is evaluated for English-Spanish translation exploiting Freebase as a source of knowledge. Our experimental results show that exploiting KG information not only decreases the number of unknown words in the translation but also improves translation quality

    Handling Translation Divergences in Generation-Heavy Hybrid Machine Translation

    Get PDF
    This paper describes a novel approach for handling translation divergences in a Generation-Heavy Hybrid Machine Translation (GHMT) system. The approach depends on the existence of rich target language resources such as word lexical semantics, including information about categorial variations and subcategorization frames. These resources are used to generate multiple structural variations from a target-glossed lexico-syntactic representation of the source language sentence. The multiple structural variations account for different translation divergences. The overgeneration of the approach is constrained by a target-language model using corpus-based statistics. The exploitation of target language resources (symbolic and statistical) to handle a problem usually reserved to Transfer and Interlingual MT is useful for translation from structurally divergent source languages with scarce linguistic resources. A preliminary evaluation on the application of this approach to Spanish-English MT proves this approach extremely promising. The approach however is not limited to MT as it can be extended to monolingual NLG applications such as summarization. Also UMIACS-TR-2002-23 Also LAMP-TR-08

    Efficient Parsing for Korean and English: A Parameterized Message Passing Approach

    Get PDF
    This article presents an efficient, implemented approach to cross-linguistic parsing based on Government-Binding (GB) Theory (Chomsky, 1986) and followers. One of the drawbacks to alternative GB-based parsing approaches is that they generally adopt a filterbased paradigm. These approaches typically generate all possible candidate structures of the sentence that satisfy X theory, and then subsequently apply filters in order to eliminate those structures that violate GB principles. (See, for example, (Abney, 1989; Correa, 1991; Dorr, 1993; Fong, 1991).) The current approach provides an alternative to filter-based designs which avoids these difficulties by applying principles to description

    LonXplain: Lonesomeness as a Consequence of Mental Disturbance in Reddit Posts

    Full text link
    Social media is a potential source of information that infers latent mental states through Natural Language Processing (NLP). While narrating real-life experiences, social media users convey their feeling of loneliness or isolated lifestyle, impacting their mental well-being. Existing literature on psychological theories points to loneliness as the major consequence of interpersonal risk factors, propounding the need to investigate loneliness as a major aspect of mental disturbance. We formulate lonesomeness detection in social media posts as an explainable binary classification problem, discovering the users at-risk, suggesting the need of resilience for early control. To the best of our knowledge, there is no existing explainable dataset, i.e., one with human-readable, annotated text spans, to facilitate further research and development in loneliness detection causing mental disturbance. In this work, three experts: a senior clinical psychologist, a rehabilitation counselor, and a social NLP researcher define annotation schemes and perplexity guidelines to mark the presence or absence of lonesomeness, along with the marking of text-spans in original posts as explanation, in 3,521 Reddit posts. We expect the public release of our dataset, LonXplain, and traditional classifiers as baselines via GitHub

    On automatic filtering of multilingual texts

    Get PDF
    An emerging requirement to sift through the increasing ood of text information has led to the rapid development of information ltering technology in the past ve years. This study introduces novel approaches for ltering texts regardless of their source language. We begin with a brief description of related developments in text ltering and multilingual information retrieval. We then present three alternative approaches to selecting texts from a multilingual information stream which represent a logical evolution from existing techniques in related disciplines. Finally, a practical automated performance evaluation technique is proposed.
    • …
    corecore